Search CORE

35 research outputs found

Expected stochastic occurrences of unread genomic sequence stretches.

Author: Shai Shaham (36270)
Publication venue
Publication date
Field of study

The number of expected unread sequences of a minimum length is plotted for a genome of size 100 Mbps and 40 million 32-nt sequence reads. The probability, P, of obtaining an unread sequence of at least length L is equal to the probability of not obtaining any 32-bp sequence fragments that cover a stretch of length L. This is given by , where G is the genome size, S is the sequence read length (32), and n is the number of sequence reads examined. The expected number of deletions is then given approximately by .</p

FigShare

Determining parameters for maximally efficient screens.

Author: Shai Shaham (36270)
Publication venue
Publication date
Field of study

Graphs of nact vs. log10y are plotted for three different values of W. For nreq = 5,000 (horizontal line), the minimal value of W is 37,642. log10ymax is indicated. Plots generated by the program Mathematica 5.0 (Wolfram Research).</p

FigShare

File structure of the ‘galign’ folder.

Author: Shai Shaham (36270)
Publication venue
Publication date
Field of study

Yellow boxes, folders. Blue circles, executable files. Green hexagons, accessory text files. SNP_results contains the output of SNP_search. Alignment_results contains the output of Alignment_tool. Sequence_reads contains the output of Format_convert. Deletion_results contains the output of Deletion_search. The output of Genome_assemble is located in the Genome_sequences folder. A pre-assembled C. elegans genome (version 195) is distributed with the current software package. Feature_locations contains information about exons, introns and intergenic regions (see text) as well as the genetic code table for amino-acid predictions.</p

FigShare

The effects of different parameters on the number of haploid genomes screened.

Author: Shai Shaham (36270)
Publication venue
Publication date
Field of study

(A–C) Plots assuming all F1 animals are placed together on a single plate. p = 0.75, unless otherwise indicated. All plots were generated using the program Mathematica 5.0 (Wolfram Research). (A) Black line, graph of nact/N vs. log10y for N = 1,000. Blue and red lines, graphs of asymptotes and their equations. (B) Graphs of nact/N vs. log10y. Different colors indicate nact/N for different specified values of N. Black line, Poisson approximation, colored lines, exact solutions. Inset, magnification of the graph for the region of log10y between 0.3 and 0.5. For each value of N, y can only take on values such that 1/(N−1)≤y≤N−1. Furthermore, although for illustration purposes we have drawn the curves as continuous, y is not a continuous variable, and treating it as such only works for large N. This is most obvious for N = 10 where y can only take on the values 1/9, 1/4, 3/7, 2/3, 1, 3/2, 7/3, 4, and 9. (C) Graphs of nact/N vs. log10y for varying values of p, as defined in the text. Graphed using the Poisson approximation. (D) Graphs depicting the fractional error incurred when using the Poisson approximation to estimate nact/N for screens in which one F1 animal is plated per plate. Although graphs are continuous, only integer values of y are relevant. Also note that the smallest allowable value of y is chosen so that at least 1 F2 animal is chosen per plate.</p

FigShare

Assessing ‘galign’ predicted polymorphisms by direct sequencing.

Author: Shai Shaham (36270)
Publication venue
Publication date
Field of study

N/A, not applicable. Bold words in last column reflect positions at which a sequence change was predicted by ‘galign’.aalteration predicted by Deletion_search; sequenced change is a G-to-C substitution.bgalign read both wild-type and mutant sequences here. The mutant sequence was confirmed by sequencing.calteration predicted by Deletion_search; sequenced change is a C-to-G substitution.dalteration predicted by Deletion_search; sequenced change is a A-to-G substitution.ealteration predicted by Deletion_searc.</p

FigShare

General genetic screening scheme in C. elegans.

Author: Shai Shaham (36270)
Publication venue
Publication date
Field of study

(A) P0 animals are mutagenized, and allowed to self-fertilize to produce F1 animals. To identify recessive mutations, F1 animals are allowed to self-fertilize to produce the F2 generation. In this paper we consider the case of n F1 animals giving rise to m F2 animals. (B) Plots describing the probability, π(n), that among n F1 animals screened, following mutagenesis by EMS (r = 1,250), will be found at least one F1 animal heterozygous for a loss-of-function mutation in a gene of interest. The parameter a is as defined in the text. The plots were generated by the program Mathematica 5.0 (Wolfram Research), using n as a continuous variable.</p

FigShare

‘galign’ output files.

Author: Shai Shaham (36270)
Publication venue
Publication date
Field of study

(A) A portion of a ‘galign’ alignment tool output file indicating the numbers of wild-type and mutant reads at a given position, as well as the corresponding wild-type and mutant sequences displayed in the event that a mutation was detected. (B) A portion of a SNP_search output file is depicted for a search involving exonic sequence substitutions. Position, position with respect to exon start site. Chrom. Pos., position with respect to the indicated chromosome. WT reads, number of wild-type reads at the given position. Mut reads, number of mutant reads at the given position. (C) A portion of a Deletion_search outputfile is depicted for a search involving deletions spanning exons. Start, End, the start and end coordinates of the deletion with respect to the first nucleotide of the indicated exon. Gen. Pos. Start, Gen. Pos. End, start and end coordinates of deletion with respect to the indicated chromosome. The Comments column is used to highlight features indicative of true deletions and insertions.</p

FigShare

‘galign’ alignment algorithm.

Author: Shai Shaham (36270)
Publication venue
Publication date
Field of study

A sequence read is divided into three fragments, A, B, and C (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0007188#s2" target="_blank">Results</a>). Algorithm starts at START. seq(A), the sequence of fragment A. seq(a), a genomic sequence matching seq(A) and located at position ‘a’ in the genome. a', an alternate genomic location containing seq(A). L, length of sequence read. Yellow boxes, decision nodes. Green boxes, algorithm repeat nodes. Red boxes, algorithm end points.</p

FigShare

Optimal F2-to-F1 screening ratios and screen efficiency calculations.

Author: Shai Shaham (36270)
Publication venue
Publication date
Field of study

(A) Contour plots examining maximal screen efficiency, εmax, as a function of α and γ, for different values of p, for screens where all F1 animals are plated on one or a small number of plates. Plots generated using the program MatLab (MathWorks). (B) Graphs examining fold increase in work performed as screening ratio (m/n) deviates from its optimal value, for screens where all F1 animals are plated on one or a small number of plates and α/γ = 10. (C) Graphs depicting maximal screen efficiencies as a function of α for screens in which F1 animals are plated individually. In these graphs γ = 1, which is the most common value for this screening mode. (D) Graphs of the optimal F2-to-F1 screening ratios (m/n) for different values of p as functions of α/γ, for screens where all F1 animals are plated on one or a small number of plates. Note that the vertical axis is the natural log of m/n and not the base 10 log. (E) Graphs of the optimal F2-to-F1 screening ratios (m/n) for different values of p as functions of α, for screens in which F1 animals are plated individually. In these graphs γ = 1, which is the most common value for this screening mode.</p

FigShare

Algorithm for performing an optimal genetic screen.

Author: Shai Shaham (36270)
Publication venue
Publication date
Field of study

Flowchart begins on the top left corner at “START”. All parameters and equations are described and derived in the text. Parameters of relevance are also described in the Glossary portion of the figure. Diamond shapes indicate steps where a choice must be made.</p

FigShare

Expected stochastic occurrences of unread genomic sequence stretches.

Determining parameters for maximally efficient screens.

File structure of the ‘galign’ folder.

The effects of different parameters on the number of haploid genomes screened.

Assessing ‘galign’ predicted polymorphisms by direct sequencing.

General genetic screening scheme in <i>C. elegans</i>.

‘galign’ output files.

‘galign’ alignment algorithm.

Optimal <i>F</i>2-to-<i>F</i>1 screening ratios and screen efficiency calculations.

Algorithm for performing an optimal genetic screen.